[fix](be) Fix nested array_map lambda captures#64563
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
9bc9cf0 to
9a2a8ec
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29516 ms |
TPC-DS: Total hot run time: 172456 ms |
ClickBench: Total hot run time: 25.16 s |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Automated review completed. I found one correctness issue requiring changes.
Critical checkpoint conclusions:
- Goal/test coverage: the PR addresses nested array_map binding and adds focused BE, FE, and regression coverage, but the submitted inline issue shows a remaining nested-lambda path in array_sort comparators that is neither covered nor handled correctly.
- Scope/focus: the main changes are focused around lambda metadata and BE lambda execution; the remaining issue is in a parallel lambda execution path touched by the PR.
- Concurrency/lifecycle: no new concurrency, locking, or lifecycle risk was found in the changed code.
- Configuration/compatibility: no config was added. The thrift field is optional; old-FE missing lambda metadata is intentionally fail-fast in new BE, and I did not find a separate actionable compatibility issue beyond the submitted inline comment.
- Parallel paths: nested array_map captures are handled, but nested lambdas inside array_sort comparators still cannot capture comparator-local arguments.
- Test/style state: Linux compile, BE UT, P0 regression, CheckStyle, and clang-format checks are passing in GitHub checks. The macOS BE UT failure appears environmental (JDK 25 where the job requires JDK 17). The external coverage check is failing, but I could not tie it to a concrete code-review finding from the available evidence.
Subagent conclusions:
- optimizer-rewrite proposed OPT-1; I independently verified it and submitted it as the inline comment.
- tests-session-config found no new candidate.
- Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the current ledger/comment set after the last candidate update.
User focus: no additional user-provided review focus was present.
| return Status::OK(); | ||
| } | ||
|
|
||
| if (expr->is_slot_ref() || expr->is_virtual_slot_ref()) { |
There was a problem hiding this comment.
This skips the nested lambda body, but the comparator frame installed during execution has bind_by_name = false, parent_bindings_visible = false, and no argument bindings. For a valid lexical shape such as array_sort((x, y) -> array_map(z -> z + x, [1, 2]), arr), the x captured by the nested array_map is not processed by this walker. At execution the inner array_map pushes a named z frame; VColumnRef("x") searches that frame, then hits the anonymous comparator frame and stops, so _get_column_position() returns -1 instead of resolving the comparator's current x. That turns a capture of the comparator's own argument into an internal "input block not contain column ref" error. Please either make comparator arguments available to nested lambda frames without breaking the direct x/y ordinal handling, or reject this nested-capture shape during prepare.
|
PR approved by at least one committer and no changes requested. |
| virtual doris::Status prepare(RuntimeState* state) { | ||
| virtual doris::Status prepare(RuntimeState* state, const VExprSPtrs& children) { | ||
| static_cast<void>(children); | ||
| batch_size = state->batch_size(); |
yx-keith
left a comment
There was a problem hiding this comment.
The name-based binding rewrite (serializing lambda argument names and resolving by a runtime scope stack) is sound, and the lifetime is safe — Frame/Binding hold only a std::string name and an int column position by value (no column pointer), columns are re-fetched by position from the live block, and FrameGuard is a move-only RAII push/pop. Two issues:
-
array_sortcomparator with a nestedarray_mapcapturing the comparator argument fails at runtime. Invarray_sort_function.cpp,_set_comparator_argument_gaprecurses fromi=1and skips the nested lambda body, so aVColumnRefto the comparator argument never gets its gap set. During execution the innerarray_mappushes a named frame while the comparator frame hasbind_by_name=false/parent_bindings_visible=false, so resolution returns position-1and the query fails with an opaque "input block not contain column ref" error. This is valid SQL (e.g.array_sort((x,y) -> array_map(z -> z + x, ...), arr)); please reject it explicitly duringpreparewith a clear message instead of failing at execution. -
Performance regression: all input columns are materialized.
varray_map_function.cppsets the lambda argument base toblock->columns()and repeats every input column (convert_to_full_column_if_const+insert_many_fromper array-row per batch), whereas the previous code only materialized columns the lambda body references. For wide tables (INSERT ... SELECTwitharray_map) this raises CPU/memory noticeably. This is a deliberate trade-off for correctness (deeply-nested references can't be found by static analysis), but it should be called out in the PR description, and ideally narrowed to referenced columns plus any nested-capture set.
Issue Number: None
Related PR: None
Problem Summary: Nested lambda expressions can contain lambda argument ColumnRefs with the same column ids as outer lambda arguments. Binding lambda arguments only by column id makes BE unable to distinguish current-scope arguments from captured outer arguments, especially when inner lambdas shadow outer argument names. This change serializes per-lambda argument names through a dedicated Thrift field, uses scope-aware name binding in BE, lets inner lambda argument names shadow outer ones, and keeps a legacy single-layer fallback that infers lambda arguments from ColumnRef ids. For old FE nested lambda plans without lambda metadata, BE now returns a clear error instead of executing ambiguous bindings.
Fix nested lambda argument binding for array_map. Nested lambda plans from old FE versions without lambda metadata now return an error.
- Test:
- Unit Test: ./run-fe-ut.sh --run org.apache.doris.analysis.ExprToThriftBehaviorTest
- Unit Test: ./run-be-ut.sh --run --filter=ArrayMapFunctionTest.*:VColumnRefTest.*
- Regression test: doris-local-regression all -d query_p0/sql_functions/array_functions -s test_nested_array_map -forceGenOut
- Regression test: doris-local-regression start && doris-local-regression run -d query_p0/sql_functions/array_functions -s test_nested_array_map
- Format: git diff --check; build-support/check-format.sh
- Behavior changed: Yes. Nested lambda arguments are resolved by FE-provided argument names, and old FE nested lambda plans without metadata fail fast.
- Does this need documentation: No
bd1ca47 to
7266913
Compare
|
/review |
There was a problem hiding this comment.
I found one correctness issue in the new array_sort nested-lambda rejection path.
Critical checkpoint conclusions:
- Goal/test: The new nested array_map scope model is covered by BE unit tests and a regression suite, but the array_sort comparator shadowing case below is not covered.
- Scope and compatibility: The optional lambda_argument_names behavior was reviewed. The old-FE/no-metadata nested array_map failure is documented by the PR and tests as an intentional fail-fast path, so I did not raise it as a separate issue.
- Parallel/special paths: Direct array_map binding, VColumnRef name resolution, VExprContext cloning, and regression expected outputs look coherent. The remaining problem is specifically the array_sort comparator path where FE distinguishes comparator operands by column id while the new nested-capture rejection checks only names.
- Existing threads: This is distinct from the existing array_sort nested-capture thread because the PR added a rejection, but the new rejection still misses the second comparator operand when it is serialized with the first operand name.
- Verification: Scoped git diff --check over the PR changed-file list is clean. I did not run the full BE/FE build or test suites locally.
Subagent conclusions:
- optimizer-rewrite: OPT-1 was accepted and submitted as the inline comment.
- tests-session-config: TEST-1 was dismissed with evidence because the old-FE/no-metadata nested plan behavior is explicit in the PR body and tests.
- Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.
User focus: No additional user-provided focus was specified.
| // argument instead of an unsupported capture from the array_sort comparator. | ||
| if (expr->is_column_ref()) { | ||
| if (std::ranges::find(in_scope_lambda_argument_names, expr->expr_name()) != | ||
| in_scope_lambda_argument_names.end()) { |
There was a problem hiding this comment.
This local-name check still misses captures of the second comparator argument. visitArraySort() builds the second comparator ref by cloning the first one and changing only column_id to 1, so a body reference to comparator y can arrive here as ColumnRef(name="x", column_id=1). If an inner lambda also declares x, this branch returns OK before looking at the column id, and runtime name resolution binds the ref to the inner x frame. A shape like array_sort((x, y) -> array_map(x -> x + y, [1, 2]), arr) is then accepted and evaluates the capture as x + x instead of rejecting the unsupported comparator capture. Please either make this validation consider comparator column ids, or serialize the second comparator ref with its own name before relying on name-based shadowing.
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
| ref->set_gap(lambda_argument_base + argument_index - ref->column_id()); | ||
| } | ||
| return Status::OK(); | ||
| } |
There was a problem hiding this comment.
else,to remove 414 return
| } | ||
| } | ||
|
|
||
| LambdaExecutionContext& VExprContext::lambda_execution_context() { |
There was a problem hiding this comment.
更好的是 lazy init
LambdaExecutionContext& VExprContext::lambda_execution_context() {
if (!_lambda_execution_context) {
_lambda_execution_context = std::make_unique();
}
return *_lambda_execution_context;
}
|
|
||
| int _column_id; | ||
| std::atomic<int> _gap = 0; | ||
| std::atomic<bool> _gap_set = false; |
There was a problem hiding this comment.
感觉用-1初始化_gap就能规避这个变量了,你看看是否可以
|
|
||
| private: | ||
| int _get_column_position(VExprContext* context, const Block* block) const { | ||
| if (context != nullptr) { |
| int _get_column_position_without_context(const Block* block) const { | ||
| if (!_gap_set.load()) { | ||
| const int position_by_name = _find_column_position_by_name(block); | ||
| if (position_by_name >= 0) { |
| } | ||
|
|
||
| int _find_column_position_by_name(const Block* block) const { | ||
| if (block == nullptr) { |
There was a problem hiding this comment.
这个block也不应是空的,你把这个空指针的check代码都梳理一下
| names.reserve(lambda_argument_base + arguments.size()); | ||
| data_types.reserve(lambda_argument_base + arguments.size()); | ||
| for (int column_id : required_input_column_ids) { | ||
| if (column_id < 0 || block == nullptr || |
Issue Number: None
Related PR: None
Problem Summary: Legacy lambda argument binding updated VColumnRef gap fields on shared expression nodes during execution. Since VExprContext clones share the same root expression tree, that mutable per-execution binding state can race across cloned contexts. This change moves legacy positional lambda bindings into LambdaExecutionContext::Frame, resolves VColumnRef by frame column id for positional frames, and removes the mutable gap fields from VColumnRef. Array sort comparator bindings are also represented as comparator-local positional frame entries.
None
- Test:
- Unit Test: ./run-be-ut.sh --run --filter=VColumnRefTest.*:ArrayMapFunctionTest.*
- Format: git diff --check; build-support/check-format.sh
- Static Analysis: build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (attempted; failed on existing/toolchain diagnostics including missing stddef.h, unmatched NOLINTEND in core/types.h, and pre-existing function-size/complexity warnings)
- Behavior changed: No
- Does this need documentation: No
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: None
Problem Summary: Nested
array_mapexpressions can capture variables from an outer lambda while also defining their own lambda arguments. The BE previously assigned lambda argument column offsets by recursively mutating everyVColumnRefunder the outer lambda. This could make an inner lambda argument reuse the outer lambda block offset and read the wrong column duringINSERT ... SELECT, causing a BE crash. This change serializes lambda argument metadata to BE, preserves captured input columns in lambda blocks, and resolves lambda arguments by the current runtime lambda scope while respecting nested-lambda shadowing.array_sortcomparator arguments are intentionally position-based because FE may distinguish the two comparator arguments only by column id. A comparator that captures an outer lambda argument cannot be represented in the current two-column comparator block and could otherwise be silently resolved as one of the comparator arguments. This PR now returns a clear error for unsupported captured column or slot references inarray_sortcomparators.A follow-up refactor moves
LambdaExecutionContextintobe/src/exprs/lambda_function/lambda_execution_context.h.VExprContextnow keeps the lambda execution context as a private implementation detail and includes the concrete definition only fromvexpr_context.cpp, while lambda functions andVColumnRefinclude the dedicated lambda header where they need the concrete type.Release note
Fix nested
array_maplambda captures and reject unsupported captured references inarray_sortcomparators.Check List (For Author)
./run-be-ut.sh --run --filter=ArrayMapFunctionTest.NamedLambdaWithFewerArgumentsThanArraysUsesDeclaredBindings:ArrayMapFunctionTest.*:VColumnRefTest.*./run-be-ut.sh --run --filter=ArrayMapFunctionTest.NestedArraySortComparatorCapturingOuterArgumentReturnsError:ArrayMapFunctionTest.NestedArraySortInsideArrayMapSkipsArrayMapArgumentInference./run-be-ut.sh --run --filter=ArrayMapFunctionTest.*./build.sh --be --fe./build.sh --be./run-regression-test.sh --run -d query_p0/sql_functions/array_functions -s test_nested_array_map_insert -forceGenOut./run-regression-test.sh --run -d query_p0/sql_functions/array_functions -s test_nested_array_map_insert./run-regression-test.sh --run -d query_p0/sql_functions/array_functions -s test_nested_array_map./run-regression-test.sh --run -d query_p0/virtual_slot_ref -s fix_array_type_and_lambda_func./run-regression-test.sh --run -d query_p0/virtual_slot_ref./run-regression-test.sh --run -d query_p0/sql_functions/array_functions./build-support/clang-format.sh;./build-support/check-format.sh;git diff --check./build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN --base HEAD, but local clang-tidy analysis was blocked by existing/local diagnostics including missingstddef.h, unmatchedNOLINTENDinbe/src/core/types.h, and pre-existing function-size/cognitive-complexity diagnostics.build-support/run-clang-tidy.sh --build-dir be/build_Release, but it is still blocked by the same existing/toolchain diagnostics.array_sortcomparators now return an error instead of being silently position-bound.